forked from TransformerLensOrg/TransformerLens
-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor components #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bryce13950
wants to merge
77
commits into
main
Choose a base branch
from
refactor-components
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 91a5712.
This reverts commit cecf93e.
This reverts commit 9d8e91a.
This reverts commit ef4518b.
This reverts commit beb014e.
Remove the pytorch versioning fix as this has been solved with the latest pytorch version. Also format with even better toml so that the pyproject is easier to read.
…sformerLensOrg#477) * Fixing numerical issues * Added qwen lol * setup local * allclose * Added qwen * Cleaned up implementation * removed untested models * Cleaned up implementation removed untested models * commented untested models * formatting * fixed mem issues + trust_remote_code * formatting * merge * Force rerun checks --------- Co-authored-by: Andy Arditi <andyrdt@gmail.com>
* Add a function to convert nanogpt weights * Remove need for bias parameter
* Add Support for CodeLlama-7b * Reformat --------- Co-authored-by: Neel Nanda <neelnanda27@gmail.com>
--------- Co-authored-by: Alan <41682961+alan-cooney@users.noreply.github.com>
* add LlamaForCausalLM arch. parsing and 01-ai/Yi * fix attn bias dim error * fix attn dim error... again * add chat models * format * add sentencepiece for yi-chat tokenizers * update poetry.lock * update gqa comment * update poetry.lock --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* make cspell not mad * add new init methods Add in kaiming, xavier, and (incomplete) MuP initializations * Various small typo, comments, and bugfixes * tests for inits * more cspell edits so it's happy * run black with default -l 88 * fix to make docs compile properly * accidently is not a word, whoops
* chore: fixing type errors and enabling mypy * updated pyproject * fixing typing after merging updates * fixed correct typing for float --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* add moe config options * bump transformers version, needed for hf mixtral * add architecture config * add moe component, no hooks yet * add convert_mixtral_weights * formatting * fix convert_mixtral_weights * fixes * rename moe state_dict names * add multi-gpu fixes by @coolvision * fix einsum * fix moe forward pass * cap mixtral context, model working * disable ln folding for moe (for now) * update htconfig docstring with moe options * formatting * add benchmarker to test_hooked_transformer * add moe gate and chosen expert hooks * formatting * add moe dtype warning * add special cases page to docs * formatting * fix missing .cfg * fix doc heading level, add desc. to moe hook points * fix formatting * fix new mypy errors * fix mypy issues for real this time * rename moe gate hook names --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
…ings (TransformerLensOrg#538) * Update black line length to 100 * run black with -l 100 * edit contributing.md to include new line length * add black -l 100 to .vscode for convenience * fixed merge saving error * fixed merge issue in params * ran format * ran format on tests --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* Refactor hook_points * restored remaining refactor * ran format * added partial registering again * restored prepend * added type comment again * fixed spacing --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* qkv initial fix * add test and update BertBlock * formatting changes * fix flaky gqa test * move helper function to utils * ran reformat --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key
* fixed install version and key name * fixed remaining issues with no position experiment * removed extra key * fixed othello in colab
* added optional token to transfomers loading * added secret for make docs command * ran format * added gated models instructions * rearranged env setting * moved hf token * added temporary log * changed secret reference * changed env variable reference * changed token reference * changed back to secrets reference * removed microsoft models from remote code list * updated token again
* Start work on adding llama. * Remove v2 from arxiv URL. * Remove llama special case (breaks because hf_config is not defined). * Remove TODO. llama-2-70b-hf and Llama 3 models all have n_key_value_heads set so they'll use Grouped-Query Attention. * Add back check for non-hf-hosted models. * Hardcode Llama-3 configs. See discussion on TransformerLensOrg#549 for why. --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* working demo of 4bit quantized Llama * add memory info to the demo * cleanup, asserts for quantization * hooks reading/writing * test in colab; do not import Int8Params * add some comments * format; fix optional argument use * merge with main * format * ran format * locked attribution patching to 1.1.1 * fixed demo for current colab * minor typing fixes for mypy * fixing typing issue * removing extra W_Q W_O * ignored merge artifacts & push for proper CI run --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: hannamw <mh2parker@gmail.com>
* removed deuplicate rearrange block * removed duplicate variables * fixed param name
* revised demo testing to check all demos * separated demos * changed demo test order * rearranged test order * updated attribution patching to run differnt code in github * rearranged tests * updated header * updated grokking demo * updated bert for testing * updated bert demo * ran cells * removed github check * removed cells to skip * ignored output of loading cells * removed other tests
* implement HookedSAETransformer * clean up imports * apply format * only recompute error if use_error_term * add tests * run format * fix import * match to hooks API * improve doc strings * improve demo * address Arthur feedback * try to fix indent: * try to fix indent again * change doc code block
* reworked CI to publish code coverage report * added coverage report to docs * added support for python 3.12 and removed extra steps on legacy versions of python * moved main check back to python 3.11 * removed coverage flag * moved download command * fixed name * specified file name * removed link
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist: